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Abstract 

We survey diverse approaches to the notion of information: from Shannon 
entropy to Kolmogorov complexity. Two of the main applications of Kol- 
mogorov complexity are presented: randomness and classification. The 
survey is divided in two parts in the same volume. 

Part I is dedicated to information theory and the mathematical formaliza- 
tion of randomness based on Kolmogorov complexity. This last applica- 
tion goes back to the 60's and 70's with the work of Martin-Lof, Schnorr, 
Chaitin, Levin, and has gained new impetus in the last years. 

Keywords: Logic, Computer Science, Algoritmmic Information Theory, 
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Note. Following Robert Soare's recommendations ( |49j. 1996), which have now 
gained large agreement, we write computable and computably enumerable in 
place of the old fashioned recursive and recursively enumerable. 

Notation. By logcc (resp. log s x) we mean the logarithm of x in base 2 (resp. 
base s where s > 2). The "floor" and "ceil" of a real number x are denoted 
by L^J and \x\ : they are respectively the largest integer < x and the smallest 
integer > x. Recall that, for s > 2, the length of the base s representation of 
an integer k is £ > 1 if and only if s e_1 < k < s . Thus, the length of the base 
s representation of an integer k is 1 + l^og s k\ = 1 + ■ 
The number of elements of a finite family T is denoted by %T . 
The length of a word u is denoted by 

1 Three approaches to a quantitative definition 
of information 

A title borrowed from Kolmogorov' seminal paper ([25], 1965). 

1.1 Which information? 
1.1.1 About anything... 

About anything can be seen as conveying information. As usual in mathematical 
modelization, we retain only a few features of some real entity or process, and 
associate to them some finite or infinite mathematical objects. For instance, 

• - an integer or a rational number or a word in some alphabet, 

- a finite sequence or a finite set of such objects, 

- a finite graph,... 

• - a real, 

- a finite or infinite sequence of reals or a set of reals, 

- a function over words or numbers,... 

This is very much as with probability spaces. For instance, to modelize the dis- 
tributions of 6 balls into 3 cells, (cf. Feller, [TS], §1.2, II. 5) we forget everything 
about the nature of balls and cells and of the distribution process, retaining 
only two questions: "how many balls in each cell?" and "are the balls and cells 
distinguishable or not?" . Accordingly, the modelization considers 

- either the 729 = 3 6 maps from the set of balls into the set of cells in case the 
balls are distinguishable and so are the cells (this is what is done in Maxwell- 
Boltzman statistics), 

( 6 -\- (3 1 ) \ 

- or the 28 = ( _ J triples of non negative integers with surr0 6 in 



1 This value is easily obtained by identifying such a triple with a binary word with six 
letters for the six balls and two letters 1 to mark the partition in the three cells. 
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case the cells are distinguishable but not the balls (this is what is done in Bose- 
Einstein statistics) 

- or the 7 sets of at most 3 integers with sum 6 in case the balls are undistin- 
guishable and so are the cells. 

1.1.2 Especially words 

In information theory, special emphasis is made on information conveyed by 
words on finite alphabets. I.e., on sequential information as opposed to the 
obviously massively parallel and interactive distribution of information in real 
entities and processes. A drastic reduction which allows for mathematical de- 
velopments (but also illustrates the Italian saying "traduttore, traditore!"). 

As is largely popularized by computer science, any finite alphabet with more 
than two letters can be reduced to one with exactly two letters. For instance, 
as exemplified by the ASCII code (American Standard Code for Information 
Interchange), any symbol used in written English - namely the lowercase and 
uppercase letters, the decimal digits, the diverse punctuation marks, the space, 
apostrophe, quote, left and right parentheses - together with some simple ty- 
pographical commands - such as tabulation, line feed, carriage return or "end 
of file" - can be coded by binary words of length 7 (corresponding to the 128 
ASCII codes). This leads to a simple way to code any English text by a binary 
word (which is 7 times longer jj. 

Though quite rough, the length of a word is the basic measure of its information 
content. Now, a fairness issue faces us: richer the alphabet, shorter the word. 
Considering groups of k successive letters as new letters of a super-alphabet, one 
trivially divides the length by k. For instance, a length n binary word becomes 
a length word with the usual packing of bits by groups of 8 (called bytes) 

which is done in computers. 

This is why all considerations about the length of words will always be devel- 
oped relative to binary alphabets. A choice to be considered as a normalization 
of length. 

Finally, we come to the basic idea to measure the information content of a math- 
ematical object x : 

length of a shortest binary word 

information content of x = 

which "encodes" x 

What do we mean precisely by "encodes" is the crucial question. Following the 
trichotomy pointed by Kolmogorov in [25], 1965, we survey three approaches. 

2 For other European languages which have a lot of diacritic marks, one has to consider the 
256 codes of Extended ASCII which have length 8. And for non European languages, one has 
to turn to the 65 536 codes of Unicode which have length 16. 
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1.2 Combinatorial approach: entropy 

1.2.1 Constant-length codes 

Let us consider the family A n of length n words in an alphabet A with s letters 
ai,...,a s . Coding the a^'s by binary words WiS all of length [logs], to any 
word u in A n we can associate the binary word £ obtained by substituting the 
Wi's to the occurrences of the a^s in u. Clearly, £ has length nflogs]. Also, 
the map u i— > £ from the set A* of words in alphabet A to the set {0, 1}* of 
binary words is very simple. Mathematically, considering on A* and {0, 1}* 
the algebraic structure of monoid given by the concatenation product of words, 
this map u i— > £ is a morphism since the image of a concatenation uv is the 
concatenation of the images of u and v. 

1.2.2 Variable-length prefix codes 

Instead of coding the s letters of A by binary words of length [log s] , one can 
code the a^s by binary words w^s having different lengthes so as to associate 
short codes to most frequent letters and long codes to rare ones. This is the 
basic idea of compression. Using such codes, the substitution of the w^s to 
the occurrences of the a^'s in a word u gives a binary word £. And the map 
u i — y £ is again very simple. It is still a morphism from the monoid of words on 
alphabet A to the monoid of binary words and can also be computed by a finite 
automaton. 

Now, we face a problem: can we recover u from £ ? i.e., is the map u i— > £ 
injective? In general the answer is no. However, a simple sufficient condition to 
ensure decoding is that the family wi,...,w s be a so-called prefix-free code (or 
prefix code) . Which means that if i ^ j then Wi is not a prefix of Wj . 

This condition insures that there is a unique Wi 1 which is a prefix 
of £. Then, considering the associated suffix £i of v (i.e., v — u^fi) 
there is a unique Wi 2 which is a prefix of £i, i.e., u is of the form 
u = w^Wi^- And so on. 

Suppose the numbers of occurrences in u of the letters a±, ...,a s are mi, m s , 
so that the length of u is n = mi + ... + m s . Using a prefix- free code Wi,...,w s , 
the binary word £ associated to u has length mi|uii| + ... + m s |u; s |. A natural 
question is, given mi,...,m s , how to choose the prefix-free code w±,...,w s so as 
to minimize the length of t; ? 

Huffman ([21 , 1952) found a very efficient algorithm (which has linear time 
complexity if the frequencies are already ordered). This algorithm (suitably 
modified to keep its top efficiency for words containing long runs of the same 
data) is nowadays used in nearly every application that involves the compression 
and transmission of data: fax machines, modems, networks,... 
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1.2.3 Entropy of a distribution of frequencies 

The intuition of the notion of entropy in information theory is as follows. Given 

natural integers m 1; m s , consider the family F mi ms of length n ~ m±+...+ 

m s words of the alphabet A in which there are exactly m-i, m s occurrences of 
letters a\, a s . How many binary digits are there in the binary representation 
of the number of words in J 7 mi ,...,m B ? It happens (cf. Proposition II .2[) that 
this number is essentially linear in n, the coefficient of n depending solely on 
the frequencies — It is this coefficient which is called the entropy H of 

the distribution of the frequencies ^ , . . ., ^ . 

Definition 1.1 (Shannon, jJH], 1948). Let ft, / s be a distribution of frequen- 
cies, i.e., a sequence of reals in [0, 1] such that fi + ... + f s = 1. The entropy of 
fx , f s is the real 

ff=-(/ilog(/i) + ... + /.log(/,)) 

Proposition 1.2 (Shannon, 48 , 1948). Let mi,...,m s be natural integers and 
n = mi + ... + m s . Then, letting H be the entropy of the distribution of frequen- 
cies ^ , the number ^F mi> .., t m s of words in F mi mg satisfies 

log(fcF mil ... >mi> ) =nH + O(logn) 

where the bound in 0(logn) depends solely on s and not on mi, ...,m s . 

Proof. The set /',„. ,„ contains mi!x " ! XTras! words. Using Stir- 
ling's approximation of the factorial function (cf. I5i), namely a;! = 
v / 2~7r x x +h e~ x +Ti where < 9 < 1, and equality n = mi + ... + ms, 
we get 

log ( — i z U \ — r) = (J2 m ') lo s( n ) ~ E m ' 1 lo & m *) 

mi! x ... x m s ! ^-^ ' 



--log( ) - (s- l)logV2^r + Q 

2 mi x ... x m s 



where |a| < ^loge. The difference of the first two terms is equal 
to n[J2i ^ log(-^ 1 )] = nH and the remaining sum is O(logn) since 
< n. □ 



mi X ... X m s 



H has a striking significance in terms of information content and compression. 

Any word u in T mx ms is uniquely characterized by its rank in this family 

(say relatively to the lexicographic ordering on words in alphabet A). In par- 
ticular, the binary representation of this rank "encodes" u. Since this rank 
is < jt7 7 m I ,...,m J j its binary representation has length < nH up to an O(logn) 
term. Thus, nH can be seen as an upper bound of the information content 
of u. Otherwise said, the n letters of u are encoded by nH binary digits. In 
terms of compression (nowadays so popular with the zip- like softwares), u can 
be compressed to nH bits, i.e., the mean information content (which can be seen 
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as the compression size in bits) of a letter of u is H. 
Let us look at two extreme cases. 

• If all frequencies /, are equal to | then the entropy is log(s), so that the mean 
information content of a letter of u is log(s), i.e., there is no better (prefix- free) 
coding than that described in fll.2.11 

• In case some of the frequencies is 1 (hence all other ones being 0), the infor- 
mation content of u is reduced to its length n, which, written in binary, requires 
log(n) bits. As for the entropy, it is (with the usual convention OlogO = 0, 
justified by the fact that lim a; _ i .o x log x — 0). The discrepancy between nH = 
and the true information content logn comes from the O(logn) term in Propo- 
sitiorOI 

1.2.4 Shannon's source coding theorem for symbol codes 

The significance of the entropy explained above has been given a remarkable and 
precise form by Claude Elwood Shannon (1916-2001) in his celebrated paper 
[4*8] . 1948. It's about the length of the binary word f associated to u via a 
prefix-free code. Shannon proved 

- a lower bound of |£| valid whatever be the prefix- free code wi,...,w s , 

- an upper bound, quite close to the lower bound, valid for particular prefix-free 
codes wi,...,w s (those making f shortest possible, for instance those given by 
Huffman's algorithm). 

Theorem 1.3 (Shannon, [48], 1948). Suppose the numbers of occurrences in u 
of the letters oi, a s are mi, m s . Let n = m\ + ... + m s . 

1. For every prefix- free sequence of binary words Wi,...,w s (which are to code 
the letters a\,...,a s ), the binary word f obtained by substituting W{ to each oc- 
currence of a>i in u satisfies 

nH < |f | 

where H = — log(^) + ... + ^ log(^)) is the entropy of the considered 
distribution of frequencies 

2. There exists a prefix-free sequence of binary words Wi,...,w s such that 

nH < |f | < n(H + l) 
Proof. First, we recall two classical results. 

Kraft's inequality. Let £i,...,£ s be a finite sequence of integers. In- 
equality 2~ e± + ... + 2~ ls < 1 holds if and only if there exists a prefix- 
free sequence of binary words uii , ...,w s such that l\ — \w\ \ , ... , l s = 

\W S \. 

Gibbs' inequality. Let pi, ...,p s and q±, q s be two probability dis- 
tributions, i.e., the pt's (resp. (fo's) are in [0, 1] and have sum 1. Then 
- J2Pi^og(pi) < -J^Pi^sili) witn equality if and only if p l = q L 
for all i. 
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Proof of Point 1 of Theorem [L3\ Set pi = m and qi 
S = E^-^'l. Then 

|f| = = n[E« ^(-logfe) - log 5)] 




where 



>W-(E i ^log(^)-log5]=n[ J ff 



logs'] > ni? 



The first inequality is an instance of Gibbs' inequality. For the last 
one, observe that S < 1. 

Proof of Point 2 of Theoremglfi Set ^ = [- log(^)] . Observe that 
2 -U < m._ Thus, 2~ l1 + ... + 2~ e ° < 1. Applying Kraft inequality, 
we see that there exists a prefix- free family of words w\, w s with 
lengthes l\,...,l s . 

We consider the binary word f obtained via this prefix-free code, 
i.e., f is obtained by substituting wi to each occurrence of ai in u. 
Observe that -log(^) < l { < -log(^) + 1. Summing, we get 
nH < |f| <n(H + 1). □ 

In particular cases, the lower bound nH can be achieved. 

Theorem 1.4. In case the frequencies — 's are all negative powers of two (i.e., 
i, i, ■£,■■■) then the optimal f (given by Huffman's algorithm) satisfies f = nH . 

1.2.5 Closer to the entropy 

In £11.2.31 and 11.2.41 we supposed the frequencies to be known and did not con- 
sider the information content of these frequencies. We now deal with that ques- 
tion. 

Let us go back to the encoding mentioned at the start of i jl .2.31 A word u in 
the family J r mii ... i m s (of length n words with exactly mi, ...,m s occurrences of 
oi, a s ) can be recovered from the following data: 

- the values of m 1; ...,m s , 

- the rank of u in J r mii ... i?Tls (relative to the lexicographic order on words). 
We have seen (cf. Proposition II. 2[) that the rank of u has a binary representa- 
tion p of length < nH + O(logn). The integers mi, m s are encoded by their 
binary representations /xi, ...,fi s which all have length < 1 + [lognj. Now, to 
encode mi, ...,m s and the rank of u, we cannot just concatenate fit, ...,/i s ,p : 
how would we know where /n stops, where \ii starts,..., in the word obtained 
by concatenation? Several tricks are possible to overcome the problem, they are 
described in ijl. 2. 61 Using Proposition 11.51 we set f = (/xi, /i s , p) which has 
length |f | = |p| + <D(\pi\ + ... + \p s \) = nH + O(logn) (Proposition [TBI gives a 
much better bound but this is of no use here) . Then u can be recovered from f 
which is a binary word of length nH + O(logn). Thus, asymptotically, we get 
a better upper bound than n(H + 1), the one given by Shannon for prefix-free 
codes (cf. Theorem II. 3p . 

Of course, f is no more obtained from u via a morphism (i.e., a map which 
preserves concatenation of words) between the monoid of words in alphabet A 
and that of binary words. 
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Notice that this also shows that prefix-free codes are not the only way to effi- 
ciently encode into a binary word £ a word u from alphabet 01, a g for which 
the numbers mi, ...,m s of occurrences of the aj's are known. 

1.2.6 Coding finitely many words with one word 

How can we code two words u, v with only one word? The simplest way is to 
consider u$v where $ is a fresh symbol outside the alphabet of u and v. But 
what if we want to stick to binary words? As said above, the concatenation of 
u and v does not do the job: how can one recover the prefix u in uv? A simple 
trick is to also concatenate the length of |u| in unary and delimitate it by a 

p times 

zero. Indeed, denoting by l p the word 1 ... 1 , one can recover u and v from 
the word l^Ouv : the length of the first block of l's tells where to stop in the 
suffix uv to get u. In other words, the map (u,v) —> l> u '0uv is injective from 
{0, 1}* x {0, 1}* — > {0, 1}*. In this way, the code of the pair (u,v) has length 
2|u| + \ v\ + 1. This can obviously be extended to more arguments using the map 
(ui, u s , v) i— > ll Ul lol U2 l . . . e' u "'E'ui . . . u s v (where e = is s is even and e = 1 
is s is odd and e' = 1 — e. 

Proposition 1.5. Let s > I. There exists a map ( ) : ({0,1}*) S+1 -> {0,1}* 
which is injective and computable and such that, for all Ui, ...,u s ,v € {0,1}*, 
|(tti, ...,u s ,v)\ = 2(|«i| + ... + \u s \) + \v\ + 1. 

The following technical improvement will be needed in Part II §2.1. 

Proposition 1.6. There exists a map ( ) : ({0, 1}*) S+1 — >• {0, 1}* which is 
injective and computable and such that, for all Ui, ...,u s ,v € {0, 1}*, 

\(ui,...,u s ,v)\ = (|ui| + ... + K|) + (log|tii| + ... + log |w s |) 

+2(loglog |t*i| + ... + log log K|) + \v\ + 0(1) 

Proof. We consider the case s = 1, i.e., we want to code a pair 
(u,v). Instead of putting the prefix ll u l0, let us put the binary 
representation of the number |u| prefixed by its length. This 

gives the more complex code: 1^(I u I-'I0/3(|m|)uw with length 

|u| + M+2(|log|<Lt|J +1) + 1 < |«| + \v\ +21og|u|+3 

The first block of ones gives the length of /3(H)- Using this length, 
we can get /3(|u|) as the factor following this first block of ones. Now, 
/3(|it|) is the binary representation of \u\, so we get \u\ and can now 
separate u and v in the suffix uv. □ 

1.3 Probabilistic approach: ergodicity and lossy coding 

The abstract probabilistic approach allows for considerable extensions of the 
results described in 31.21 
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First, the restriction to fixed given frequencies can be relaxed. The probability 
of writing dj may depend on what has already been written. For instance, 
Shannon's source coding theorem has been extended to the so called "ergodic 
asymptotically mean stationary source models" . 

Second, one can consider a lossy coding: some length n words in alphabet A are 
ill-treated or ignored. Let 8 be the probability of this set of words. Shannon's 
theorem extends as follows: 

- whatever close to 1 is 8 < 1, one can compress u only down to nH bits. 

- whatever close to is 8 > 0, one can achieve compression of u down to nH 
bits. 

1.4 Algorithmic approach: Kolmogorov complexity 

1.4.1 Berry's paradox 

So far, we considered two kinds of binary codings for a word u in alphabet 
Oi, ...,a s . The simplest one uses variable-length prefix- free codes f£ )1.2.2p . The 
other one codes the rank of u as a member of some set f £)1.2.5[) . 
Clearly, there are plenty of other ways to encode any mathematical object. Why 
not consider all of them? And define the information content of a mathematical 
object x as the shortest univoque description of x ( written as a binary word). 
Though quite appealing, this notion is ill defined as stressed by Berry's parado^o 

Let N be the lexicographically least binary word which cannot be 
univoquely described by any binary word of length less than 1000. 

This description of N contains 106 symbols of written English (including spaces) 
and, using ASCII codes, can be written as a binary word of length 106 x7 = 742. 
Assuming such a description to be well defined would lead to a univoque de- 
scription of N in 742 bits, hence less than 1000, a contradiction to the definition 
of N. 

The solution to this inconsistency is clear: the quite vague notion of univoque 
description entering Berry's paradox is used both inside the sentence describing 
N and inside the argument to get the contradiction. A clash between two levels: 

• the would be formal level carrying the description of TV 

• and the meta level which carries the inconsistency argument. 

Any formalization of the notion of description should drastically reduce its scope 
and totally forbid any clash such as the above one. 

1.4.2 The turn to computability 

To get around the stumbling block of Berry's paradox and have a formal notion 
of description with wide scope, Andrei Nikolaievitch Kolmogorov (1903-1987) 
made an ingenious move: he turned to computability and replaced description 
by computation program. Exploiting the successful formalization of this a priori 

3 Berry's paradox is mentioned by Bertrand Russell in 1908 f |44|. p. 222 or 150), who 
credited G.G. Berry, an Oxford librarian, for the suggestion. 



10 



vague notion which was achieved in the thirtiefl This approach was first an- 
nounced by Kolmogorov in [24], 1963, and then developped in [25], 1965. Similar 
approaches were also independently developed by Solomonoff in [SO], 1964, and 
by Chaitin in [6, 7 , 1966-1969. 

1.4.3 Digression on computability theory 

The formalized notion of computable function (also called recursive function) 
goes along with that of partial computable function (also called partial recursive 
function) which should rather be called partially computable partial function, 
i.e., the partial character has to be distributed^]. 
So, there are two theories : 

• the theory of computable functions, 

• the theory of partial computable functions. 

The "right" theory, the one with a cornucopia of spectacular results, is that of 
partial computable functions. 

Let us pick up three fundamental results out of the cornucopia, which 
we state in terms of computers and programming languages. Let 
X and O be N or A* where A is some hnite or countably infinite 
alphabet (or, more generally, X and O can be elementary sets, cf. 
Definition [L9|. 

Theorem 1.7. 

1. [Enumeration theorem] The function which executes programs on 
their inputs: (program, input) — > output is itself partial computable. 
Formally, this means that there exists a partial computable function 

[/:{0,l}*xI^O 

such that the family of partial computable function X — > O is exactly 
{U e \e€ {0, 1}*} where U e {x) = U(e, x). 

Such a function U is called universal for partial computable func- 
tions T — > O. 

2. [Parameter theorem (or s™ thru)]. One can exchange input and 
program (this is von Neumann's key idea for computers) . 
Formally, this means that, letting X = X\ x I2, universal maps 
Ux 1 xi 2 an d Uz 2 are such that there exists a computable total map 
s : {0, 1}* X X\ {0, 1}* such that, for all e G {0, 1}* , X\ E Xi and 
x 2 € X 2 , 

U XlX x 2 (e, (x 1 ,x 2 )) = U l2 (s(e,x 1 ),x 2 ) 

4 Through the works of Alonzo Church (via lambda calculus), Alan Mathison Turing (via 
Turing machines) and Kurt Godel and Jacques Herbrand (via Herbrand-Godel systems of 
equations) and Stephen Cole Kleene (via the recursion and minimization operators). 

5 In French, Daniel Lacombe ( |27j . 1960) used the expression semi-fonction semi-recursive. 
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3. [Kleene fixed point theorem] For any transformation of programs, 
there is a program which does the same input — > output job as its 
transformed progran^. 

Formally, this means that, for every partial computable map f : 
{0, 1}* -> {0, 1}*, there exists e such that 

Vee{0, 1}* VxeX U(f(e),x) = U(e,x) 



1.4.4 Kolmogorov complexity (or program size complexity) 

Turning to computability, the basic idea for Kolmogorov complexity can be 
summed up by the following equation: 

description — program 

When we say "program" , we mean a program taken from a family of programs, 
i.e., written in a programming language or describing a Turing machine or a 
system of Herbrand-Godel equations or a Post system,... 

Since we are soon going to consider the length of programs, following what has 
been said in i ll. 1.21 we normalize programs: they will be binary words, i.e., 
elements of {0, 1}*. 

So, we have to fix a function p : {0, 1}* — > O and consider that the output of a 
program p is (p(jp). 

Which ip are we to consider? Since we know that there are universal partial com- 
putable functions (i.e., functions able to emulate any other partial computable 
function modulo a computable transformation of programs, in other words, a 
compiler from one language to another) , it is natural to consider universal par- 
tial computable functions. Which agrees with what has been said in £11.4.31 
Let us give the general definition of the Kolmogorov complexity associated to 
any function {0, 1}* -» O. 

Definition 1.8. If ip : {0, 1}* —> O is a partial function, set K v : O — s- N 

K<p(y) = min{|p| : ip(p) = y} 
with the convention that min = +oo . 

Intuition: p is a program (with no input), p executes programs (i.e., ip is al- 
together a programming language plus a compiler plus a machinery to run pro- 
grams) and tp(p) is the output of the run of program p. Thus, for y £ O, K v (y) 
is the length of shortest programs p with which p computes y (i.e., <p(p) = y). 

As said above, we shall consider this definition for partial computable func- 
tions {0, 1}* — > O. Of course, this forces to consider a set O endowed with a 
computability structure. Hence the choice of sets that we shall call elementary 
which do not exhaust all possible ones but will suffice for the results mentioned 
in this paper. 

e This is the seed of computer virology, cf. [1] 

7 Delahaye's books 1111 1121 present a very attractive survey on Kolmogorov complexity. 
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Definition 1.9. The family of elementary sets is obtained as follows: 

- it contains N and the A* 's where A is a finite or countable alphabet, 

- it is closed under finite (non empty) product, product with any non empty finite 
set and the finite sequence operator. 

Note. Closure under the finite sequence operator is used to encode formulas in 
Theorem [2^1 



1.4.5 The invariance theorem 



The problem with Dcfinition ll.8l is that K v strongly depends on (p. Here comes a 
remarkable result, the invariance theorem, which insures that there is a smallest 
K v , up to a constant. It turns out that the proof of this theorem only needs 
the enumeration theorem and makes no use of the parameter theorem (usually 
omnipresent in computability theory). 

Theorem 1.10 (Invariance theorem, Kolmogorov, [25], 1965). Let O be an 

elementary set (cf. DeHnition \1.9\) . Among the K v 's, where ip : {0,1}* — > O 
varies in the family PC° of partial computable functions, there is a smallest 
one, up to an additive constant (= within some bounded interval). I.e. 

3V G PC° V<p G PC° 3c VyeO K v (y) < K v {y) + c 

Such a V is called optimal. 

Moreover, any universal partial computable function {0, 1}* — > O is optimal. 

Proof. Let U : {0,1}* x {0,1}* -> O be partial computable and 
universal for partial computable functions {0, 1}* — > O (cf. point 1 
of Theorem ITT)) . 

Let c : {0, 1}* x {0, 1}* -> {0, 1}* be a total computable injective 
map such that |c(e, as) | = 2|e| + \x\ + 1 (cf. Proposition II .5|) . 
Define V : {0, 1}* — > O, with domain included in the range of c, as 
follows: 

Ve g {0, 1}* Vz G {0, 1}* V(c(e, x)) = U(e, x) 

where equality means that both sides are simultaneously defined or 
not. Then, for every partial computable function tp : {0, 1}* — > O, 
for every y G O, if ip — U e (i.e., <p(x) = U(e,x) for all x, cf. point 1 
of Theorem 11.71) then 



Kv(y) = least \p\ such that V(p) = y 

< least |c(e,x)| such that V(c(e,x)) = y 

(least is relative to x since e is fixed) 
= least |c(e, x)\ such that U(e, x)) = y 
= least \x\ + 2 1 e | + 1 such that (p(x) = y 

since |c(e, x)\ = \x\ + 2|e| + 1 and <p(x) — U(e, x) 
= (least \x\ such that <p(x) = y) + 2|e| + 1 
= K v (y) +2\e\+l □ 
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Using the invariance theorem, the Kolmogorov complexity K° : O — > N is 
defined as Ky where V is any fixed optimal function. The arbitrariness of the 
choice of V does not modify drastically Ky, merely up to a constant. 

Definition 1.11. Kolmogorov complexity K° : O — > N is Ky , where V is some 
fixed optimal partial function {0, 1}* — > O. When O is clear from context, we 
shall simply write K . 

K is therefore minimum among the K v 's, up to an additive constant. 

K is defined up to an additive constant: if V and V' are both optimal then 

3c VxeO \K v (x)-K v ,(x)\<c 

1.4.6 What Kolmogorov said about the constant 

So Kolmogorov complexity is an integer defined up to a constant. . . ! But the 
constant is uniformly bounded for x £ O. 

Let us quote what Kolmogorov said about the constant in [25 , 1965: 

Of course, one can avoid the indeterminacies associated with the 
[above] constants, by considering particular [. . .functions V], but it 
is doubtful that this can be done without explicit arbitrariness. 
One must, however, suppose that the different "reasonable" [above 
optimal functions] will lead to "complexity estimates" that will con- 
verge on hundreds of bits instead of tens of thousands. 
Hence, such quantities as the "complexity" of the text of "War and 
Peace" can be assumed to be defined with what amounts to unique- 
ness. 

In fact, this constant witnesses the multitude of models of computation: uni- 
versal Turing machines, universal cellular automata, Herbrand-Godel systems 
of equations, Post systems, Kleene definitions,... If we feel that one of them 
is canonical then we may consider the associated Kolmogorov complexity as 
the right one and forget about the constant. This has been developed for 
Schoenfinkel-Curry combinators S,K,I by Tromp, cf. [31] §3.2.2-3.2.6. 
However, even if we fix a particular Ky, the importance of the invariance the- 
orem remains since it tells us that K is less than any K v (up to a constant). A 
result which is applied again and again to develop the theory. 

1.4.7 Considering inputs: conditional Kolmogorov complexity 

In the enumeration theorem, we considered (program, input) — > output functions 
(cf. Theorem II .7j) . Then, in the definition of Kolmogorov complexity, we gave 
up the inputs, dealing with program —¥ output functions. 

Conditional Kolmogorov complexity deals with the inputs. Instead of measur- 
ing the information content of y € O, we measure it given as free some object 
z, which may help to compute y. A trivial case is when z — y, then the infor- 
mation content of y given y is null. In fact, there is an obvious program which 
outputs exactly its input, whatever be the input. 
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Let us mention that, in computer science, inputs are also considered as envi- 
ronments. 

Let us state the formal definition and the adequate invariance theorem. 

Definition 1.12. If ip : {0,1}* x 1 — x O is a partial function, set K v ( | ) : 
O x 2 -> N 

K v (y | z) = min{|p| | (p(p, z) = y} 

Intuition: p is a program (with expects an input z), tp executes programs (i.e., tp 
is altogether a programming language plus a compiler plus a machinery to run 
programs) and <p(p, z) is the output of the run of program p on input z. Thus, 
for y 6 O, K v (y \ z) is the length of shortest programs p with which ip computes 
y on input z (i.e., (p(p, z) = y). 

Theorem 1.13 (Invariance theorem for conditional complexity). Among the 
K ip ( | )'s, where (p varies in the family PC® of partial computable functions 
{0, 1}* xl — > O, there is a smallest one, up to an additive constant (i.e., within 
some bounded interval) : 

3V £ PC% yip £ PC% 3c VyeO Vzel K v (y \ z) < K v (y | z) + c 

Such a V is called optimal. 

Moreover, any universal partial computable map {0, 1}* xI->C is optimal. 
The proof is similar to that of Theorem 11.101 

Definition 1.14. K x ^° : O x 1 — > N is Ky( \ ) where V is some fixed optimal 
partial function. 

K x ^° is defined up to an additive constant: if V et V are both optimal then 

3c VyeO Vzel \K v (y\z)-K v ,(y\z)\<c 

Again, an integer defined up to a constant. . . ! However, the constant is uniform 
in y e O and z € I. 

1.4.8 Simple upper bounds for Kolmogorov complexity 

Finally, let us mention rather trivial upper bounds: 

- the information content of a word is at most its length. 

- conditional complexity cannot be harder than the non conditional one. 

Proposition 1.15. 

1. There exists c such that 

Vx e {0, 1}* K^' (x)<\x\+c , V?i € N K N (n) < log(n) + c 

2. There exists c such that 

VxeO VyEl K x ^°{x \ y) < K°(x) + c 

3. let f : O —¥ O' be computable. There exists c such that 
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V.t G O K°'(f(x)) < K°(x)+c 

MxeO vr el K x ^°'(f(x) \ y ) < K x ^°(x\ y ) + c 

Proof. We only prove 1. Let Id : {0, 1}* -> {0, 1}* be the identity 
function. The invariance theorem insures that there exists c such 
that K^^" < k\Y Y +c. Now, it is easy to see that k\°/ Y = \x\, 
so that K <°' 1 >* (a;) < | cc | + c. 

Let 6* : {0, 1}* —> N be the function (which is, in fact, a bijection) 
which associates to a word w = a^_i...ao the integer 

6{u) = (2 k + a k --L2 k - 1 + ... + 2ai + a ) - 1 

(i.e., the predecessor of the integer with binary representation lu). 
Clearly, Kg(n) = [\og(n + 1)J . The invariance theorem insures that 
there exists c such that K N < K^ + c. Hence K N (n) < log(n) + c+l 
for all neN. □ 

The following technical property is a variation of an argument already used in 
Ml .2.51 the rank of an element in a set defines this element, and if the set is 
computable, so is this process. 

Proposition 1.16. Let A C N x O be computable such that A n = An ({n} x O) 
is finite for all n. Then, letting $X be the number of elements of X , 

3c Wx <E A n K(x | n) < log()J(A n )) + c 

Proof. Observe that x is determined by its rank in A n . This rank 
is an integer < $A n hence its binary representation has length < 

Lio g (|ML n )J + 1. ' □ 

2 Kolmogorov complexity and undecidability 

2.1 K is unbounded 

Let K = K v ■ O N where V : {0, 1}* ->• O is optimal (cf. Theorem gTTTO)) . 
Since there are finitely many programs of size < n (namely, the 2™ +1 — 1 binary 
words of size < n), there are finitely many elements of O with Kolmogorov 
complexity less than n. This shows that K is unbounded. 

2.2 K is not computable 

Berry's paradox (cf. ^1.4.11) has a counterpart in terms of Kolmogorov complex- 
ity: it gives a very simple proof that K 7 which is a total function O —> N, is not 
computable. 

Proof that K is not computable. For simplicity of notations, we 
consider the case O = N. Define L : N — >• O as follows: 

L(n) — least k such that K(k) > 2n 
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So that K(L(n)) > 2n for all n. If if were computable so would 
be L. Let V : — > N be optimal, i.e., K = Ky. The invariance 
theorem insures that there exists c such that K < Kl + c. Observe 
that Kl(L(ti) < n by definition of Kl- Thus, 

2n < K(L(n)) < K L (L(n) + c<n + c 

A contradiction for n > c. □ 

The non computability of K can be seen as a version of the undecidability of the 
halting problem. In fact, there is a simple way to compute K when the halting 
problem is used as an oracle. To get the value of K(x), proceed as follows: 

- enumerate the programs in {0, 1}* in lexicographic order, 

- for each program p, check if V{p) halts (using the oracle), 

- in case V(p) halts then compute its value, 

- halt and output \p\ when some p is obtained such that V(p) = x. 

The converse is also true: one can prove that the halting problem is computable 
with K as an oracle. 

The argument for the undecidability of K can be used to prove a much stronger 
statement: K can not be bounded from below by any unbounded partial com- 
putable function. 

Theorem 2.1 (Kolmogorov). There is no unbounded partial recursive function 
tp : O — > N such that ip( x ) < K( x ) f or a ^ x * n ^ e domain of if). 

Of course, K is bounded from above by a total computable function, cf. Propo- 
sition [TTT5] 

2.3 K is computable from above 

Though K is not computable, it can be approximated from above. The idea 
is simple. Suppose O = {0,1}*. Let c be as in point 1 of Proposition 11.151 
Consider all programs of length less than |x| + c and let them be executed 
during t steps. If none of them converges and outputs x then take |x| + c as a 
i-bound. If some of them converges and outputs x then the bound is the length 
of the shortest such program. 

The limit of this process is K(x), it is obtained at some finite step which we are 
not able to bound. 

Formally, this means that there is some F : O x N — >• N which is computable 
and decreasing in its second argument such that 

K{x) = lim F(x, t) = min{F(a;, t) \ t 6 N} 

2.4 Kolmogorov complexity and Godel's incompleteness 
theorem 

A striking version of Godel's incompleteness theorem has been given by Chaitin 
in [8j |9], 1971-1974, in terms of Kolmogorov complexity. Since Godel's cele- 
brated proof of the incompleteness theorem, we know that, in the language of 
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arithmetic, one can formalize computability and logic. In particular, one can 
formalize Kolmogorov complexity and statements about it. Chaitin's proves a 
version of the incompleteness theorem which insures that among true unprovable 
formulas there are all true statements K(u) > n for n large enough. 

Theorem 2.2 (Chaitin, [5], 1974). Let T be a computably enumerable set of 
axioms in the language of arithmetic. Suppose that all axioms in T are true in 
the standard model of arithmetics with base N. Then there exists N such that if 
T proves K(u) > n (with u G {0, 1}* and n G N) then n < N. 

How the constant N depends on T has been giving a remarkable analysis by 
Chaitin. To that purpose, he extends Kolmogorov complexity to computably 
enumerable sets. 

Definition 2.3 (Chaitin, [9 , 1974). Let O be an elementary set (cf. Definition 
\1.9jl andCE be the family of computably enumerable (c.e.) subsets of O . To any 
partial computable if : {0, 1}* xN->0, associate the Kolmogorov complexity 
K v : C£ — > N such that, for all c.e. subset T of O, 

K V (T) = mm{\p\ | T - W(p,t) | t G N}} 

(observe that {(p(p,t) \ t G N} is always c.e. and any c.e. subset of O can be 
obtained in this way for some ip). 

The invariance theorem still holds for this notion of Kolmogorov complexity, 
leading to the following notion. 

Definition 2.4 (Chaitin, [9], 1974). K C£ : C£ -> N is K v where ip is some 
fixed optimal partial function. It is defined up to an additive constant. 

We can now state how the constant N in Theorem 12.21 depends on the theory 

r. 

Theorem 2.5 (Chaitin, [9], 1974). There exists a constant c such that, for all 
c. e. sets T satisfying the hypothesis of Theorem \2.HX the associated constant N 
is such that 

N < K C£ {T)+c 

Chaitin also reformulates Theorem 12.21 as follows: 

If T consist of true formulas then it cannot prove that a string has 
Kolmogorov complexity greater than the Kolmogorov complexity of 
T itself (up to a constant independent ofT). 

Remark. 2.6. The previous statement, and Chaitin's assertion that the Kol- 
mogorov complexity of T somehow measures the power of T as a theory, has 
been much criticized in van Lambalgen ([3S] , 1989), Fallis (P3], 1996) and 
Raatikainen ([43], 1998). Raatikainen's main argument in [43] against Chaitin's 
interpretation is that the constant in Theorem 12.21 strongly depends on the 
choice of the optimal function V such that K — Ky. Indeed, for any fixed 
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theory T, one can choose such a V so that the constant is zero! And also choose 
V so that the constant is arbitrarily large. 

Though these arguments are perfectly sound, we disagree with the criticisms 
issued from them. Let us detail three main rebuttals. 

• First, such arguments are based on the use of optimal functions associated 
to very unnatural universal functions V (cf. point 1 of Theorem 11.71 and the 
last assertion of Theorem I1.10|) . It has since been recognized that universality 
is not always sufficient to get smooth results. Universality by prefix adjunc- 
tion is sometimes required, (cf., for instance, §2.1 and §6 in Becher, Figueira, 
Grigorieff & Miller, 2006). This means that, for an enumeration (^3 e ) eG { ,i}* 
of partial computable functions, the optimal function V is to satisfy equality 
V(ep) = (p e (p), for all e, p, where ep is the concatenation of the strings e and p. 

• Second, and more important than the above technical counterargument, it 
is a simple fact that modelization rarely rules out all pathological cases. It is 
intended to be used in "reasonable" cases. Of course, this may be misleading, 
but perfect modelization is illusory. In our opinion, this is best illustrated by 
Kolmogorov's citation quoted in £11.4.61 to which Raatikainen's argument could 
be applied mutatis mutandis: there are optimal functions for which the com- 
plexity of the text of "War and Peace" is null and other ones for which it is 
arbitrarily large. Nevertheless, this does not prevent Kolmogorov to assert (in 
the founding paper of the theory [25]): [For] "reasonable" [above optimal func- 
tions], such quantities as the "complexity" of the text of "War and Peace" can 
be assumed to be defined with what amounts to uniqueness. 

• Third, a final technical answer to such criticisms has been recently provided 
by Calude & Jurgensen in [5], 2005. They improve the incompleteness result 
given by Theorem l2.2[ proving that, for a class of formulas in the vein of those 
in that theorem, the probability that such a formula of length n is provable 
tends to zero when n tends to infinity whereas the probability that it be true 
has a strictly positive lower bound. 

3 Kolmogorov complexity: some variations 

Note. The denotations of (plain) Kolmogorov complexity (that of M 1.4.5)) and 
its prefix version (cf. 13.31) may cause some confusion. They long used to be 
respectively denoted by K and H in the literature. But in their book [31] (first 
edition, 1993), Li & Vitanyi respectively denoted them by C and K. Due to 
the large success of this book, these last denotations are since used in many 
papers. So that two incompatible denotations now appear in the literature. In 
this paper, we stick to the traditional denotations K and H. 
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3.1 Levin monotone complexity 

Kolmogorov complexity is non monotone, be it on N with the natural ordering 
or on {0, 1}* with the lexicographic ordering. In fact, for every n and c, there 
are strings of length n with complexity > n(l — 2~ c ) (cf. Proposition I4.2[) . 
However, since n H> 1™ is computable, K(l n ) < K(n) + 0(1) < log?i + 0(l) (cf. 
point 3 of Proposition II. 15[) is much less than n(l — 2~ c ) for n large enough. 

Leonid Levin ([29 , 1973) introduced a monotone version of Kolmogorov com- 
plexity. The idea is to consider possibly infinite computations of Turing ma- 
chines which never erase anything on the output tape. Such machines have finite 
or infinite outputs and compute total maps {0, 1}* — > {0, where {0, 1}-" = 
{0, 1}* U {0, 1} N is the family of finite or infinite binary strings. These maps can 
also be viewed as limit maps p —> sup^^ <p(p, t) where ip : {0, 1}* x N — > {0, 1}* 
is total monotone non decreasing in its second argument. 

To each such map ip, Levin associates a monotone non decreasing map K™ on : 
{0, 1}* -> N such that 

K™ n (x) = min{|p| | 3t x < pref <p(p,t)} 

Theorem 3.1 (Levin ( 29 , 1973). 

1. If ^ is total computable and monotone non decreasing in its second argument 
then K™ on : {0, 1}* — ¥ N is monotone non decreasing: 

x< pref y^K™° n (x)<K™° n (y) 

2. Among the K™ on 's, ip total computable monotone non decreasing in its second 
argument, there exists a smallest one, up to a constant. 

Considering total tp's in the above theorem is a priori surprising since there is no 
computable enumeration of total computable functions and the proof of the In- 
variance Theorem II .101 is based on the enumeration theorem (cf. Theorem 1 1.7p . 
The trick to overcome that problem is as follows. 

• Consider all partial computable ip : {0, 1}* x N — > {0, 1}* which are total 
monotone non decreasing in their second argument. 

• Associate to each such ip a total <p defined as follows: <p(p, t) is the largest 
tp(p, t') such that t' < t and tp(t') is defined within t + 1 computation steps 
if there is such a t' . If there is none then tp(p, t) is the empty word. 

• Observe that K™ on (x) = K r S lon {x). 

In ^5.2.31 we shall see some remarkable property of Levin monotone complexity 
K mon concerning Martin-L6f random reals. 
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3.2 Schnorr process complexity 

Another variant of Kolmogorov complexity has been introduced by Klaus Peter 
Schnorr in [37] . 1973. It is based on the subclass of partial computable functions 
99 : {0,1}* — > {0,1}* which are monotone non decreasing relative to the prefix 
ordering: 

(*) (p < P ref q A tpip), <p(q) are both defined) <p(p) < pre f <p{q) 
Why such a requirement on (p? The reason can be explained as follows. 

• Consider a sequential composition (i.e., a pipeline) of two processes, for- 
malized as two functions /, g. The first one takes an input p and outputs 
f(p), the second one takes f{p) as input and outputs g(f(p))- 

• Each process is supposed to be monotone: the first letter of f(p) appears 
first, then the second one, etc. Idem with the digits of g(q) for any input 

q- 

• More efficiency is obtained if one can develop the computation of g on 
input f(p) as soon as the letters of f(p) appear. More precisely, suppose 
the prefix q of f(p) has already appeared but there is some delay to get 
the subsequent letters. Then we can compute g{q). But this is useful only 
in case the computation of g(q) is itself a prefix of that of g(f(p))- This 
last condition is exactly the requirement (*). 

An enumeration theorem holds for the </?'s satisfying (*), allowing to prove 
an invariance theorem and to define a so-called process complexity K proc : 
{0, 1}* — > N. The same remarkable property of Levin's monotone complexity 
also holds with Schnorr process complexity, cf. ^5.2.31 

3.3 Prefix (or self-delimited) complexity 

Levin ([H], 1974), Gacs ([18], 1974) and Chaitin ([10], 1975) introduced the 
most successful variant of Kolmogorov complexity: the prefix complexity. The 
idea is to restrict the family of partial computable functions {0, 1}* — > O (recall 
O denotes an elementary set in the sense of Definition II. 9p to those which have 
prefix-free domains, i.e. any two words in the domain are incomparable with 
respect to the prefix ordering. 

An enumeration theorem holds for the <^'s satisfying (*), allowing to prove an 
invariance theorem and to define the so-called prefix complexity H : {0, 1}* — > N 
(not to be confused with the entropy of a family of frequencies, cf. £|1.2.3p . 

Theorem 3.2. Among the K v 's, where <p : {0, 1}* — > O varies over partial 
computable functions with prefix-free domain, there exists a smallest one, up to 
a constant. This smallest one (defined up to a constant), denoted by H° , is 
called the prefix complexity. 
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This prefix-free condition on the domain may seem rather technical. A con- 
ceptual meaning of this condition has been given by Chaitin in terms of self- 
delimitation. 

Proposition 3.3 (Chaitin, [TU], 1975). A partial computable function ip : 
{0, 1}* — > O has prefix-free domain if and only if it can be computed by a Turing 
machine M. with the following property: 

If x is in domain{ip) (i.e., M. on input p halts in an accepting state 
at some step ) then the head of the input tape of M. reads entirely the 
input p but never moves to the cell right to p. 

This means that p, interpreted as a program, has no need of external action 
(as that of an end-of-file symbol) to know where it ends: as Chaitin says, the 
program is self-delimited. A comparison can be made with biological phenom- 
ena. For instance, the hand of a person grows during its childhood and then 
stops growing. No external action prevents the hand to go on growing. There 
is something inside the genetic program which creates a halting signal so that 
the hand stops growing. 

The main reason for the success of the prefix complexity is that, with prefix-free 
domains, one can use the Kraft-Chaitin inequality (cf. the proof of Theorem 
11.31 in ^1 .2.41) and get remarkable properties. 

Theorem 3.4 (Kraft-Chaitin inequality). A sequence (resp. computable se- 
quence) (ni)itzf{ ofnon negative integers is the sequence of lengths of a prefix-free 
(resp. computable) family of words (ui)igN if and only z/X^gN^ - ™ 1 — 1- 

Let us state the most spectacular property of the prefix complexity. 

Theorem 3.5 (The Coding Theorem (Levin ((30], 1974)). Consider the family 
£f e - of sequences ofnon negative real numbers (r x ) xe o such that 

• X)xeo r x < +°o (i.e., the series is summable), 

• {(x,q) G O x Q | q < r x } is computably enumerable (i.e., the r x 's have 
c.e. left cuts in the set of rational numbers Q and this is uniform in x). 

The sequence (2~ H ^) x &0 * s in &\ e ' and, up to a multiplicative factor, it is 
the largest sequence in l^ e ' . This means that 

V(r x ) x& o e IT 3c VxeO r x <c2- H °^ 

In particular, consider a countably infinite alphabet A. Let V : {0, 1}* — > A 
be a partial computable function with prefix- free domain such that H A = Ky. 
Consider the prefix code {p a )aeA such that, for each letter a £ A, p a is a shortest 
binary string such that V{p a ) = a. Then, for every probability distribution P : 
A — > [0, 1] over the letters of the alphabet A, which is computably approximable 
from below (i.e., {(a,q) E A x Q | q < P(a)} is computably enumerable), we 
have 

VaeA P(a)<c2- HA ^ 
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for some c which depends on P but not on a € A. This inequality is the reason 
why the sequence (2~ H '°') a6y i is also called the universal a priori probability 
(though, strictly speaking, it is not a probablity since the 2~ H W'sdonot sum 
up to 1). 

3.4 Oracular Kolmogorov complexity 

As is always the case in computability theory, everything relativizes to any or- 
acle Z. Relativization modifies the equation given at the start of ijl.4.41 which 
is now 

description — program of a partial Z-computable function 

and for each possible oracle Z there exists a Kolmogorov complexity relative to 
oracle Z. 

Oracles in computability theory can also be considered as second-order argu- 
ments of computable or partial computable functionals. The same holds with 
oracular Kolmogorov complexity: the oracle Z can be seen as a second-order 
condition for a second-order conditional Kolmogorov complexity 

K(y | Z) where K(\) :0 x P(X) -> N 

Which has the advantage that the unavoidable constant in the "up to a con- 
stant" properties does not depend on the particular oracle. It depends solely 
on the considered functional. 

Finally, one can mix first-order and second-order conditions, leading to a condi- 
tional Kolmogorov complexity with both first-order and second-order conditions 

K(y | z, Z) where K ( | , ) : O x 1 x P(X) -> N 

We shall see in £15.6.21 an interesting property involving oracular Kolmogorov 
complexity. 

3.5 Sub-oracular Kolmogorov complexity 

Going back to the idea of possibly infinite computations as in ^3.11 Let us define 
K°° : {0, 1}* -> N such that 

K°°{x) = wm{\p\ | U(p) = x} 

where U is the map {0,1}* — > {0,1}-" computed by a universal Turing ma- 
chine with possibly infinite computations. This complexity lies between K and 
K{ | 0') (where 0' is a computably enumerable set which encodes the halting 
problem) : 

Vx K(x | 0') < K°°(x) + 0(1) < K(x) + 0(1) 

This complexity is studied in p], 2005, by Becher, Figueira, Nies & Picci, and 
also in our paper [17], 2006. 
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4 Formalization of randomness: finite objects 



4.1 Sciences of randomness: probability theory 

Random objects (words, integers, reals,...) constitute the basic intuition for 
probabilities ... but they are not considered per se. No formal definition of ran- 
dom object is given: there seems to be no need for such a formal concept. The 
existing formal notion of random variable has nothing to do with randomness: 
a random variable is merely a measurable function which can be as non random 
as one likes. 

It sounds strange that the mathematical theory which deals with randomness 
removes the natural basic questions: 

• What is a random string? 

• What is a random infinite sequence? 

When questioned, people in probability theory agree that they skip these ques- 
tions but do not feel sorry about it. As it is, the theory deals with laws of 
randomness and is so successful that it can do without entering this problem. 

This may seem to be analogous to what is the case in geometry. What are 
points, lines, planes? No definition is given, only relations between them. Giv- 
ing up the quest for an analysis of the nature of geometrical objects in profit of 
the axiomatic method has been a considerable scientific step. 
However, we contest such an analogy. Random objects are heavily used in many 
areas of science and technology: sampling, cryptology... Of course, such objects 
are in fact "as much as we can random". Which means fake randomness. But 
they refer to an ideal notion of randomness which cannot be simply disregarded. 

In fact, since Pierre Simon de Laplace (1749-1827), some probabilists never gave 
up the idea of formalizing the notion of random object. Let us cite particularly 
Richard von Mises (1883-1953) and Kolmogorov. In fact, it is quite impres- 
sive that, having so brilliantly and efficiently axiomatized probability theory 
via measure theory in [53], 1933, Kolmogorov was not fully satisfied of such 
foundationsif]. And he kept a keen interest to the quest for a formal notion of 
randomness initiated by von Mises in the 20's. 

4.2 The 100 heads paradoxical result in probability theory 

That probability theory fails to completely account for randomness is strongly 
witnessed by the following paradoxical fact. In probability theory, if we toss 
an unbiaised coin 100 times then 100 heads are just as probable as any other 
outcome! Who really believes that? 

The axioms of probability theory, as developped by Kolmogorov, do 
not solve all mysteries that they are sometimes supposed to. 

8 Kolmogorov is one of the rare probabilists - up to now - not to believe that Kolmogorov's 
axioms for probability theory do not constitute the last word about formalizing randomness... 
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Gdcs, "Ml, 1993 



4.3 Sciences of randomness: cryptology 

Contrarily to probability theory, cryptology heavily uses random objects. Though 
again, no formal definition is given, random sequences are produced which are 
not fully random, just hard enough so that the mechanism which produces them 
cannot be discovered in reasonable time. 

Anyone who considers arithmetical methods of producing random 
reals is, of course, in a state of sin. For, as has been pointed out 
several times, there is no such thing as a random number — there are 
only methods to produce random numbers, and a strict arithmetical 
procedure is of course not such a method. 

Von Neumann, 40 , 1951 

So, what is "true" randomness? Is there something like a degree of randomness? 
Presently, (fake) randomness only means to pass some statistical tests. One can 
ask for more. 

4.4 Kolmogorov's proposal: incompressible strings 

We now assume that O = {0,1}*, i.e., we restrict to words. 

4.4.1 Incompressibility with Kolmogorov complexity 

Though much work had been devoted to get a mathematical theory of random 
objects, notably by von Mises ((35] [36], 1919-1939), none was satisfactory up 
to the 60 's when Kolmogorov based such a theory on Kolmogorov complexity, 
hence on computability theory. 

The theory was, in fact, independently developed by Gregory J. Chaitin (b. 
1947), [Sl[7] who submitted both papers in 1965. 

The basic idea is as follows: 

• larger is the Kolmogorov complexity of a text, more random is the text, 

• larger is its information content, and more compressed is the text. 

Thus, a theory for measuring the information content is also a theory of ran- 
domness. 

Recall that there exists c such that for all x £ {0, 1}*, K(x) < \x\ + c (Proposi- 
tion II . 15|) . The reason being that there is a "stupid" program of length about 
\x\ which computes the word x by telling what are the successive letters of x. 
The intuition of incompressibility is as follows: x is incompressible if there no 
shorter way to get x. 

Of course, we are not going to define absolute randomness for words. But a 
measure of randomness telling how far from \x\ is K(x). 

9 For a detailed analysis of who did what, and when, see Li & Vitanyi's book |31J . p. 89-92. 
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Definition 4.1 (Measure of incompressibility) . 
A word x is c-incompressible if K(x) > \x\ — c. 

It is rather intuitive that most things are random. The next Proposition for- 
malizes this idea. 

Proposition 4.2. For any n, the proportion of c-incompressible strings of 
length n is > 1 — 2~ c . 

Proof. At most 2™~ c — 1 programs of length < n — c and 2 n strings 
of length n. □ 

4.4.2 Incompressibility with length conditional Kolmogorov com- 



We observed in §1.2.3l that the entropy of a word of the form 000. ..0 is null, i.e., 
entropy did not considered the information conveyed by the length. 
Here, with incompressibility based on Kolmogorov complexity, we can also ig- 
nore the information content conveyed by the length by considering incompress- 
ibility based on length conditional Kolmogorov complexity. 

Definition 4.3 (Measure of length conditional incompressibility). A word x is 
length conditional c-incompressible if K(x \ \x\) > |x| — c. 

The same simple counting argument yields the following Proposition. 

Proposition 4.4. For all n, the proportion of length conditional c-incompressible 
strings of length n is > 1 — 2~ c . 

A priori length conditional incompressibility is stronger than mere incompress- 
ibility. However, the two notions of incompressibility are about the same ... up 
to a constant. 

Proposition 4.5. There exists d such that, for all c E N and x g {0, 1}* 

1. x is length conditional c-incompressible =>■ x is (c + d) -incompressible 

2. x is c-incompressible x is length conditional (2c + d) -incompressible. 

Proof. 1 is trivial. For 2, first observe that there exists e such that, 
for all x, 



(*) K{x) < K(x | |x|) + 2K(\x\ - K{x \ \x\)) + d 

In fact, if K = K v and K{ \ ) = K^i | ), consider p, q such that 

\x\-K(x\\x\) = ip(p) ip(q\\x\) = x 
K(\x\-K(x\\x\)) = \p\ K(x\\x\) = \q\ 

With p and q, hence with (p,q) (cf. Proposition |TT5j) , one can suc- 



plexity 





just sum the above quantities 
this is tj}{q \ \x\) 
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Thus, K(x) < \(p,q)\ + 0(1). Applying Proposition IT3I we get (*). 
Using K N < log+ci and K^' 1 ^' (x) > \x\ -c (cf., Proposition H~15)) , 
(*) yields 

\x\ - K[x | |at|) < 2 log(]ac| - K(x \ \x\)) + 2c x + c + d 
Finally, observe that z < 21ogz + k insures z < max(8, 2k). □ 

4.5 Incompressibility is randomness: Martin-Lof's argu- 
ment 

Now, if incompressibility is clearly a necessary condition for randomness, how do 
we argue that it is a sufficient condition? Contraposing the wanted implication, 
let us see that if a word fails some statistical test then it is not incompressible. 
We consider some spectacular failures of statistical tests. 

Example 4.6. 

1. [Constant half length prefix] For all n large enough, a string n u with \u\ = n 
cannot be c-incompressible. 

2. [Palindromes] Large enough palindromes cannot be c-incompressible. 

3. [0 and 1 not equidistributed] For all < a < 1, for all n large enough, a 
string of length n which has < zeros cannot be c-incompressible. 

Proof. 1. Let c' be such that K(x) < \x\ + c'. Observe that there 
exists c" such that K(0 n u) < K(u) + c" hence 

K{0 n u) <n + c' + c" < i|0"M| + c' + c" 

So that K(0 n u) > \0 n u\ — c is impossible for n large enough. 

2. Same argument: There exists c" such that, for any palindrome x, 

K{x)<\\x\+J' 

3. The proof follows the classical argument to get the law of large 
numbers (cf. Feller's book [15 ). Let us do it for a = |, so that 

a J_ 

2 ~ 3' 

Let A n be the set of strings of length n with < ^ zeros. We estimate 
the number N of elements of A n . 

( n\ „ ,n ... / n \ ,n n\ 
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Use inequality 1 < < 1.1 and Stirling's formula (1730), 



. e 

Observe that 1.1 (| + 1) < n for n > 2. Therefore, 

/5 ™ (5)° _ 3 fir / 3 



Using Proposition 1 1 . 1 61 for any element of A n , we have 

K(x | n) < \og(N) + d< n log (-?=) + ^ + d 



Since ^ < 8, we have < 2 and log f^jl < 1- Hence, n — c < 

nlog + + ^ i s impossible for n large enough. 

So that x cannot be c-incompressible. □ 

Let us give a common framework to the three above examples so as to get some 
flavor of what can be a statistical test. To do this, we follow the above proofs 
of compressibility. 

Example 4.7. 

1. [Constant left half length prefix] 

Set V m = all strings with m zeros ahead. The sequence Vq, V\, ... is decreasing. 
The number of strings of length n in V m is if m > n and 2 n ~ m if m < n. 
Thus, the proportion ${ x \\ x \— n ^ g§^m} j length n words which are in V m is 

2. [Palindromes] Put in V m all strings which have equal length m prefix and 
suffix. The sequence Vq,Vi,... is decreasing. The number of strings of length n 
in V m is if m > ^ and 2™~ 2m if m < ^. Thus, the proportion of length n 
words which are in V m is 2~ 2m . 

3. [0 and 1 not equidistributcd] Put in = all strings x such that the number 
of zeros is < {a + (1 - a)2- m )^-. The sequence Vq,V\,... is decreasing. A 
computation analogous to that done in the proof of the law of large numbers 
shows that the proportion of length n words which are in V m is < 2~ 7m for 
some 7 > (independent of m). 

Now, what about other statistical tests? But what is a statistical test? A 
convincing formalization has been developed by Martin-L6f. The intuition is 
that illustrated in Example 14.71 augmented of the following feature: each V m 
is computably enumerable and so is the relation {(m,x) | x £ V m }. A feature 
which is analogous to the partial computability assumption in the definition of 
Kolmogorov complexity. 
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Definition 4.8. [Abstract notion of statistical test, Martin-Lof 1964] A statis- 
tical test is a family of nested critical sets 

{OA}* ^V DV 1 DV 2 D ...DV m D ... 

such that {(m,x) \ x € V rn } is computably enumerable and the proportion 
jjlaljgl— nA x£V m } o j i en gjfa n wor d s w hich are in V m is < 2~ m . 

Intuition. The bound 2~ m is just a normalization. Any bound b(n) such that 
b : N — > Q which is computable, decreasing and with limit could replace 2~ m . 
The significance of x € V m is that the hypothesis x is random is rejected with 
significance level 2~ m . 

Remark. 4.9. Instead of sets V m one can consider a function S : {0, 1}* — > 
N such that ttfolM-" a S{x)>m} ^ ^-m an j g j g computable from below, i.e., 
{(m,x) | S(x) > to} is recursively enumerable. 

We have just argued on some examples that all statistical tests from practice 
are of the form stated by Definition 14.81 Now comes Martin-L6f fundamental 
result about statistical tests which is in the vein of the invariance theorem. 

Theorem 4.10 (Martin-L6f, 1965). Up to a constant shift, there exists a largest 
statistical test (U m )meN 

men 

3c Vm V m+C C U m 

In terms of functions, up to an additive constant, there exists a largest statistical 
test A 

V5 3c Vx S(x) < A(x) + c 
Proof. Consider A(x) = |x| — K(x | \x\) — 1. 



A is a test. Clearly, {(to, x) \ A(x) > to} is computably enumer- 



able. 

A(x) > m means K(x \ < |x| — to — 1. So no more elements in 
{x | A(.t) > m A |x| = n} than programs of length < n — m — 1, 
which is 2 n - m - 1. 



A is largest, x is determined by its rank in the set Vm x \ — {z \ 



S(z) > 6(x) A \z\ = \x\}. Since this set has < 2 n ~ 5( - x '> elements, the 
rank of x has a binary representation of length < |ar| — S(x). Add 
useless zeros ahead to get a word p with length \x\ — 6(x). 
With p we get |x| — S(x). With |x| — S(x) and |x| we get 5(x) and 
construct Vg( x ). With p we get the rank of x in this set, hence we 
get x. Thus, 

K[x | |x|) < |x| - S(x) + c, i.e., S(x) < A(x) + c. □ 

The importance of the previous result is the following corollary which insures 
that, for words, incompressibility implies (hence is equivalent to) randomness. 
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Corollary 4.11 (Martin-L6f, 1965). Incompressibility passes all statistical tests. 
I.e., for all c, for all statistical test (V m ) m> there exists d such that 

Vx (x is c-incompressible =>• x £ V c +d) 

Proof. Let x be length conditional c-incompressible. This means 
that K(x | |x|) > |x| - c. Hence A(x) = \x\-K(x | |x|) - 1 < c- 1, 
which means that x ^ U c . 

Let now (V m ) m be a statistical test. Then there is some d such that 
V m +d C U m Therefore x V c+d . □ 

Remark. 4.12. Observe that incompressibility is a bottom-up notion: we look 
at the value of K{x) (or that of K (x | |x|)). 

On the opposite, passing statistical tests is a top-down notion. To pass all 
statistical tests amounts to an inclusion in an intersection: namely, an inclusion 
in 

n u w 

(V m ) m c 

4.6 Shortest programs are random finite strings 

Observe that optimal programs to compute any object are examples of random 
strings. More precisely, the following result holds. 

Proposition 4.13. Let O be an elementary set (cf. Definition ] 1.9]) and U : 
{0,1}* — > {0,1}*, V : {0,1}* — > O be some fixed optimal functions. There 
exists a constant c such that, for all a G O, for all p G {0, 1}*, if V(p) = a and 
Ky(a) — \p\ then Kjj{p) > \p\ — c. In other words, for any a G O, if p is a 
shortest program which outputs a then p is c-random. 

Proof. Consider the function VoJJ : {0, 1}* — > O. Using the invariance theorem, 
let c be such that Ky < Kyou + c. Then, for every q G {0, 1}*, 

U(q)=p => VoU(q)=a 

\q\ > K VoU (a) > K v (a) - c = \p\ - c 

Which proves that Kjj{p) > \p\ — c. □ 

4.7 Random finite strings and lower bounds for computa- 
tional complexity 

Random finite strings (or rather c-incompressible strings) have been extensively 
used to prove lower bounds for computational complexity, cf. the pioneering 
paper [42] by Wolfgang Paul, 1979, (see also an account of the proof in our 
survey paper [16]) and the work by Li & Vitanyi, [31]. The key idea is that a 
random string can be used as a worst possible input. 
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5 Formalization of randomness: infinite objects 



We shall stick to infinite sequences of zeros and ones: {0, 1} N . 

5.1 Martin-L6f top-down approach with topology and com- 
putability 

5.1.1 The naive idea badly fails 

The naive idea of a random element of {0, 1} N is that of a sequence a which 
is in no set of measure 0. Alas, a is always in the singleton set {a} which has 
measure ! 

5.1.2 Martin-L6f 's solution: effectivize 

Martin-Lof's solution to the above problem is to effectivize, i.e., to consider the 
sole effective measure zero sets. 

This approach is, in fact, an extension to infinite sequences of the one Martin- 
Lof developed for finite objects, cf. ^4.51 

Let us develop a series of observations which leads to Martin-Lof's precise so- 
lution, i.e., what does mean effective for measure sets. 

To prove a probability law amounts to prove that a certain set X of sequences 
has probability one. To do this, one has to prove that the complement set 
Y = {0, 1} N \ X has probability zero. Now, in order to prove that Y C {0, 1} N 
has probability zero, basic measure theory tells us that one has to include Y in 
open sets with arbitrarily small probability. I.e., for each n € N one must find 
an open set U n DY which has probability < 

If things were on the real line M we would say that U n is a countable union of 
intervals with rational endpoints. 

Here, in {0, 1} N , U n is a countable union of sets of the form u{0, 1} N where u is a 
finite binary string and u{0, 1} N is the set of infinite sequences which extend u. 
In order to prove that Y has probability zero, for each neN one must find a fam- 
ily (w n ,m)meN such that Y C |J OT u n , m {0, 1} N and Proba{\J m u n . m {0, 1} N ) < ^ 
for each n £ N. 

Now, Martin-L6f makes a crucial observation: mathematical probability laws 
which we consider necessarily have some effective character. And this effec- 
tiveness should reflect in the proof as follows: the doubly indexed sequence 
(%,m)n,roeH is computable. 

Thus, the set |J u„ jm {0, 1} N is a computably enumerable open set and f] n (J u n , m {0, 1} 
is a countable intersection of a computably enumerable family of open sets. 

Now comes the essential theorem, which is completely analogous to Theo- 
rem oni 
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Definition 5.1 (Martin-L6f, [35], 1966). A constructively null G$ set is any 
set of the form 

n m 

where Proba({J m it n . m {0, 1} N ) < (which implies that the intersection set has 
probability zero) and the sequence u n m is computably enumerable. 

Theorem 5.2 (Martin-L6f, [32 , 1966). There exist a largest constructively null 
G$ set 

Let us insist that the theorem says largest, up to nothing, really largest relative 
to set inclusion. 

Definition 5.3 (Martin-L6f, [32], 1966). A sequence a € {0, 1} N is Martin-Ldf 
random if it belongs to no constructively null G$ set (i.e., if it does not belongs 
to the largest one). 

In particular, the family of random sequences, being the complement of a con- 
structively null G$ set, has probability 1. And the observation above Defini- 
tion [ITU insures that Martin-L6f random sequences satisfy all usual probabilities 
laws. Notice that the last statement can be seen as an improvement of all usual 
probabilities laws: not only such laws are true with probability 1 but they are 
true for all sequences in the measure 1 set of Martin-Ldf random sequences. 

5.2 The bottom-up approach 
5.2.1 The naive idea badly fails 

Another natural naive idea to get randomness for sequences is to extend ran- 
domness from finite objects to infinite ones. The obvious proposal is to consider 
sequences a £ {0, 1} N such that, for some c, 

Vn K(a\n)>n — c (1) 

However, Martin-L6f proved that there is no such sequence. 

Theorem 5.4 (Large oscillations (Martin-L6f, [33J, 1971)). If f : N -> N is 

computable and X) n <EN 2~f( n ' = +oo then, for every a € {0, 1} N , there are 
infinitely many k such that K(a f k) < k — f(k) — 0(1). 

Proof. Let us do the proof in the case f(n) = logn which is quite 
limpid (recall that the harmonic series ^ = 2~ los ™ has infinite sum). 
Let k be any integer. The word a \ k prefixed with 1 is the binary 
representation of an integer n (we put 1 ahead of a \ k in order to 
avoid a first block of non significative zeros) . We claim that a \ n 
can be recovered from a \ [k + 1, n] only. In fact, 

• n — k is the length of a \ [k + 1, n], 
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• k = [logn\ + 1 = [log(n - fc)J + 1 + e (where e £ {0, 1}) is 
known from n — k and e, 

• n — (n — k) + k. 

• a \ k is the binary representation of n. 

The above analysis describes a computable map / : {0, 1}* x {0, 1} — > 
{0,1}* such that a f n = /(a f [fc + l,n],e). Applying Proposi- 
tion [L15l point 3, we get 

K{a\n) < K{a\[k+I,n]) + 0{1) < n-fc+0(l) = rc-log(n) + 0(l) 

□ 

5.2.2 Miller & Yu's theorem 

It took about forty years to get a characterization of randomness via Kolmogorov 
complexity which completes Theorem 15.41 in a very pleasant and natural way. 

Theorem 5.5 (Miller & Yu, [33], 2008). The following conditions are equiva- 
lent: 

i. The sequence a £ {0, 1} N is Martin-Ldf random 

ii. 3c Vfc K(a \ k) > k — f(k) — c for every total computable function 
/ : N — > N satisfying £„ eN 2"/W < +oo 

Hi. 3c Vfc K{a\k)>k-H{k)-c 

Moreover, there exists a particular total computable function g : N — > N satisfy- 
ing X^rieN 2~ 9 ("- ) < +oo such that one can add a fourth equivalent condition: 

iv. 3c Vfc K{a\k) > k ~ g(k) — c 

Recently, an elementary proof of this theorem was given by Bienvcnu, Mcrklc 
& Shen in [3J, 2008. Equivalence i ^> Hi is due to Gacs, [19], 1980. 

5.2.3 Variants of Kolmogorov complexity and randomness 

Bottom-up characterization of random sequences have been obtained using 
Levin monotone complexity, Schnorr process complexity and prefix complex- 
ity (cf. gin] gSjUand SSI)- 

Theorem 5.6. The following conditions are equivalent: 

i. The sequence a £ {0, 1} N is Martin-Ldf random 

ii. 3c Vfc \K mon (a\k) -fc| < c 

Hi. 3c Vfc \S(a \ k) — k\ < c 

iv. 3c Vfc H{a\k)>k — c 

Equivalence i ii is due to Levin ([52], 1970). Equivalence i <^ Hi is due to 
Schnorr ([45], 1971). Equivalence i iv is due to Schnorr and Chaitin ([TP], 
1975). 
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5.3 Randomness: a robust mathematical notion 



Besides the top-down definition of Martin-L6f randomness, we mentioned above 
diverse bottom-up characterizations via properties of the initial segments with 
respect to variants of Kolmogorov complexity. There are other top-down and 
bottom-up characterizations, we mention two of them in this §. 
This variety of characterizations shows that Martin-L6f randomness is a robust 
mathematical notion. 

5.3.1 Randomness and martingales 

Recall that a martingale is a function d : {0, 1}* — > R + such that 

The intuition is that a player tries to predict the bits of a sequence a £ {0, 1} N 
and bets some amount of money on the values of these bits. If his guess is 
correct he doubles his stake, else he looses it. Starting with a positive capital 
d(e) (where e is the empty word), d(a \k) is his capital after the k first bits of 
a have been revealed. 

The martingale d wins on a £ {0, 1} N if the capital of the player tends to +oo. 
The martingale d is computably approximable from below if the left cut of d(u) 
is computably enumerable, uniformly in u (i.e., {(u,q) £ {0, 1}* x <Q> | q < d(u)} 
is c.e.). 

Theorem 5.7 (Schnorr, [46], 1971). A sequence a £ {0, 1} N is Martin-Ldf 
random if and only if no martingale computably approximable from below wins 
on a. 

5.3.2 Randomness and compressors 

Recently, Bienvenu & Merkle obtained quite remarkable characterizations of 
random sequences in the vein of Theorems 15.61 and 15.51 involving computable 
upper bounds of K and H . 

Definition 5.8. A compressor is any partial computable T : {0, 1}* —¥ {0, 1}* 

which is one-to-one and has computable domain. A compressor is said to be 
prefix-free if its range is prefix-free. 

Proposition 5.9. 

1. If T is a compressor (resp. a prefix- free compressor) then 

3c Vx £ {0, 1}* K{x) < \T(x)\ + c (resp. H{x) < \T(x)\ + c) 

2. For any computable upper bound F of K (resp. of H) there exists a com- 
pressor (resp. a prefix-free compressor) T such that 

3c Vxe{0, 1}* \T(x)\ < F{x) + c 
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Now comes the surprising characterizations of randomness in terms of com- 
putable functions. 

Theorem 5.10 (Bienvenu & Merkle, [5], 2007). The following conditions are 
equivalent: 

i. The sequence a 6 {0, 1} N is Martin-Ldf random 
ii. For all prefix-free compressor T : {0, 1}* —¥ {0, 1}*, 

3c Vfc \T(a\k)\>k-c 

Hi. For all compressor T, 3c Vfc |T(a \k)\ > k — H(k) — c 

Moreover, there exists a particular prefix-free compressor T* and a particular 
compressor T# such that one can add two more equivalent conditions: 

iv. 3c Vfc |r*(a \ k)\ > k — c 

v. 3c Vfc \T*(a\k)\>k-\T*(a\k)\-c 

5.4 Randomness: a fragile property 

Though the notion of Martin-L6f randomness is robust, with a lot of equivalent 
definitions, as a property, it is quite fragile. 

In fact, random sequences loose their random character under very simple com- 
putable transformation. For instance, even if 0,00102... is random, the sequence 
OaoOaiOa20... IS NOT random since it fails the following Martin-L6f test: 

P| {a | Vi < n a(2i + 1) = 0} 

ti6N 

Indeed, {a | Mi < n a(2i + 1) = 0} has probability 2 _ " and is an open subset of 
{0,1} N 

5.5 Randomness is not chaos 

In a series of papers [37J I3H1 ISH] , 1993-1996, Joan Rand Moschovakis introduced 
a very convincing notion of chaotic sequence a £ {0, 1} N . It turns out that the 
set of such sequences has measure zero and is disjoint from Martin-L6f random 
sequences. 

This stresses that randomness is not chaos. As mentioned in ^5.1. 2[ random 
sequences obey laws, those of probability theory. 

5.6 Oracular randomness 
5.6.1 Relativization 

Replacing "computable" by "computable in some oracle" , all the above theory 
relativizes in an obvious way, using oracular Kolmogorov complexity and the 
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oracular variants. 

In particular, when the oracle is the halting problem, i.e. the computably enu- 
merable set 0', the obtained randomness is called 2-randomness. 
When the oracle is the halting problem of partial 0'-computable functions, 
i.e. the computably enumerable set 0", the obtained randomness is called 3- 
randomness. And so on. 

Of course, 2-randomness implies randomness (which is also called 1-randomness) 
and 3-randomncss implies 2-randomness. And so on. 

5.6.2 Kolmogorov randomness and 0' 

A natural question following Theorem l5.4l is to look at the so-called Kolmogorov 
random sequences which satisfy K(a \ k) > k — 0(1) for infinitely many fc's. 
This question got a very surprising answer involving 2-randomness. 

Theorem 5.11 (Nies, Stephan & Terwijn, [41], 2005). Let a E {0, 1} N . There 
are infinitely many k such that, for a fixed c, K(a \ k) > k — c (i.e., a is 
Kolmogorov random) if and only if a is 2-random. 

5.7 Randomness: a new foundation for probability the- 
ory? 

Now that there is a sound mathematical notion of randomness, is it possi- 
ble/reasonable to use it as a new foundation for probability theory? 
Kolmogorov has been ambiguous on this question. In his first paper on the 
subject, see p. 35-36 of [25], 1965, he briefly evoked that possibility : 

. . . to consider the use of the [Algorithmic Information Theory] con- 
structions in providing a new basis for Probability Theory. 

However, later, see p. 35-36 of 26 , 1983, he separated both topics: 

"there is no need whatsoever to change the established construction 
of the mathematical probability theory on the basis of the general 
theory of measure. I am not enclined to attribute the significance 
of necessary foundations of probability theory to the investigations 
[about Kolmogorov complexity] that I am now going to survey. But 
they are most interesting in themselves. 

though stressing the role of his new theory of random objects for mathematics 
as a whole in [2B], p. 39: 

The concepts of information theory as applied to infinite sequences 
give rise to very interesting investigations, which, without being in- 
dispensable as a basis of probability theory, can acquire a certain 
value in the investigation of the algorithmic side of mathematics as 
a whole. 
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